Objective
Over-provisioning resource requests wastes cluster capacity (and money). Under-provisioning causes OOMKills and CPU throttling. Goldilocks wraps VPA in recommendation mode — it never changes pods automatically, only suggests optimal requests and limits based on observed usage. This exercise installs both tools, runs several workloads, and applies the recommendations to reduce wasted capacity.
Prerequisites
- Kubernetes cluster with metrics-server installed
- Helm installed
- kubectl with port-forward capability
Steps
Install VPA (Vertical Pod Autoscaler)
# Install VPA CRDs and admission controller helm repo add cowboysysop https://cowboysysop.github.io/charts/ helm repo update helm install vpa cowboysysop/vertical-pod-autoscaler \ --namespace kube-system \ --set "admissionController.enabled=true" \ --wait # Verify VPA is running kubectl get pods -n kube-system | grep vpa
Install Goldilocks
# Install Goldilocks
helm repo add fairwinds-stable https://charts.fairwinds.com/stable
helm repo update
helm install goldilocks fairwinds-stable/goldilocks \
--namespace goldilocks \
--create-namespace \
--set dashboard.enabled=true \
--wait
kubectl get pods -n goldilocksEnable Goldilocks on namespaces
Goldilocks watches namespaces with the goldilocks.fairwinds.com/enabled=true label and creates VPA objects in recommendation mode for every Deployment.
# Enable Goldilocks on the default namespace kubectl label namespace default \ goldilocks.fairwinds.com/enabled=true # Enable on staging namespace too kubectl create namespace staging --dry-run=client -o yaml | \ kubectl apply -f - kubectl label namespace staging \ goldilocks.fairwinds.com/enabled=true # Verify VPA objects are created for each Deployment kubectl get vpa --all-namespaces # Should see a VPA for each Deployment in labeled namespaces
Deploy intentionally over-provisioned workloads
Create workloads with inflated resource requests to simulate the typical state of a cluster that hasn't been right-sized.
# Deploy 3 over-provisioned workloads
cat << 'EOF' | kubectl apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
name: api-service
namespace: default
spec:
replicas: 3
selector:
matchLabels: {app: api-service}
template:
metadata:
labels: {app: api-service}
spec:
containers:
- name: api
image: nginx:alpine
resources:
requests: {cpu: "2", memory: "2Gi"} # WAY over-provisioned
limits: {cpu: "4", memory: "4Gi"}
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: worker
namespace: default
spec:
replicas: 5
selector:
matchLabels: {app: worker}
template:
metadata:
labels: {app: worker}
spec:
containers:
- name: worker
image: busybox
command: ["sh", "-c", "while true; do echo working; sleep 5; done"]
resources:
requests: {cpu: "500m", memory: "1Gi"} # Over-provisioned
limits: {cpu: "1", memory: "2Gi"}
EOFGenerate load to accelerate recommendations
# Generate CPU and memory load on the api-service pods kubectl exec -n default \ $(kubectl get pod -l app=api-service -o name | head -1) \ -- sh -c " for i in \$(seq 1 10); do dd if=/dev/zero of=/dev/null bs=1M count=100 & done wait " & # Repeat for 2-3 minutes to build VPA history # In production: wait 24 hours for accurate recommendations sleep 120 # Check if VPA has generated recommendations yet kubectl describe vpa -n default api-service | grep -A20 "Recommendation"
Access the Goldilocks dashboard
# Port-forward to the Goldilocks dashboard kubectl port-forward svc/goldilocks-dashboard \ -n goldilocks 8080:80 & # Open http://localhost:8080 in your browser # The dashboard shows: # - Current requests vs recommended requests per container # - Three recommendation quality levels: Guaranteed, Burstable, BestEffort # - Ready-to-apply YAML snippets for each container # Get VPA recommendations via kubectl kubectl get vpa api-service -n default -o yaml | \ python3 -c " import sys, yaml data = yaml.safe_load(sys.stdin) recs = data.get('status', {}).get('recommendation', {}).get('containerRecommendations', []) for c in recs: print(f'Container: {c[\"containerName\"]}') print(f' Target (recommended requests): {c[\"target\"]}') print(f' Lower bound (min safe): {c[\"lowerBound\"]}') print(f' Upper bound (max observed): {c[\"upperBound\"]}') print() "
Apply recommendations and measure improvement
# Record current resource consumption kubectl top nodes kubectl top pods --all-namespaces | sort -k4 -rn | head -20 # Get recommended values from VPA (example output) # api-service: target cpu=25m, memory=32Mi (was 2000m, 2048Mi) # worker: target cpu=10m, memory=16Mi (was 500m, 1024Mi) # Apply the recommendations kubectl patch deployment api-service -n default --type=json -p='[ {"op":"replace","path":"/spec/template/spec/containers/0/resources/requests/cpu","value":"25m"}, {"op":"replace","path":"/spec/template/spec/containers/0/resources/requests/memory","value":"32Mi"}, {"op":"replace","path":"/spec/template/spec/containers/0/resources/limits/cpu","value":"500m"}, {"op":"replace","path":"/spec/template/spec/containers/0/resources/limits/memory","value":"128Mi"} ]' # Wait for rollout kubectl rollout status deployment/api-service -n default # Measure headroom improvement kubectl describe nodes | grep -A10 "Allocated resources" # Compare CPU/memory allocated before and after
Success Criteria
Further Reading
- Goldilocks documentation — goldilocks.docs.fairwinds.com
- VPA documentation — github.com/kubernetes/autoscaler/tree/master/vertical-pod-autoscaler
- Resource requests best practices — kubernetes.io/docs/concepts/configuration/manage-resources-containers
- Right-sizing with VPA — learnk8s.io/setting-cpu-memory-limits-requests