← Back to Guide
Cluster Architecture L3 · ADVANCED ~120 min

Full Cluster Backup & Cross-Region Restore with Velero

Implement a complete cluster backup strategy with Velero including persistent volume snapshots. Destroy the source cluster, then restore all workloads, secrets, and service accounts to a new cluster in a different region and validate correctness.

Objective

Install Velero with cloud provider snapshot support. Configure scheduled backups to cross-region object storage. Populate the cluster with representative workloads including a stateful application with PVC data. Simulate a full cluster loss by deleting the source cluster. Restore everything to a new target cluster in a different region and validate all workloads, secrets, ConfigMaps, and PVC data are intact.

This exercise involves deleting a cluster. Ensure you are working in a non-production environment. Budget approximately $5-10 in cloud costs for the storage and compute used.

Prerequisites

Steps

01

Install Velero on the source cluster (AWS example)

Velero needs an S3 bucket for backup storage and IAM credentials for EBS snapshot access. Use IRSA (IAM Roles for Service Accounts) in production; use access keys for this exercise.

# Create S3 bucket in a region accessible from both clusters
aws s3 mb s3://velero-backups-crossregion \
  --region us-east-1

# Enable versioning on the bucket
aws s3api put-bucket-versioning \
  --bucket velero-backups-crossregion \
  --versioning-configuration Status=Enabled

# Create velero-credentials file
cat > velero-credentials << EOF
[default]
aws_access_key_id=YOUR_ACCESS_KEY
aws_secret_access_key=YOUR_SECRET_KEY
EOF

# Install Velero with AWS provider
velero install \
  --provider aws \
  --plugins velero/velero-plugin-for-aws:v1.8.0 \
  --bucket velero-backups-crossregion \
  --backup-location-config region=us-east-1 \
  --snapshot-location-config region=us-east-1 \
  --secret-file ./velero-credentials \
  --use-volume-snapshots=true

# Verify Velero is running
kubectl get pods -n velero
velero backup-location get
02

Deploy test workloads including a stateful app

Create workloads that test different restore scenarios: a stateless deployment, a secret, a ConfigMap, a ServiceAccount, and a stateful app with PVC data.

# Create test namespace and workloads
kubectl create namespace test-restore

# Create a secret to verify secret restoration
kubectl create secret generic db-credentials \
  --from-literal=username=admin \
  --from-literal=password=supersecret123 \
  -n test-restore

# Deploy stateful app with PVC
cat << 'EOF' | kubectl apply -f -
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: data-pvc
  namespace: test-restore
spec:
  accessModes: [ReadWriteOnce]
  resources:
    requests:
      storage: 5Gi
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: stateful-app
  namespace: test-restore
spec:
  replicas: 1
  selector:
    matchLabels: {app: stateful-app}
  template:
    metadata:
      labels: {app: stateful-app}
    spec:
      containers:
      - name: app
        image: busybox
        command: [sh, -c, "echo 'restore-test-data' > /data/test.txt && sleep 3600"]
        volumeMounts:
        - mountPath: /data
          name: data
      volumes:
      - name: data
        persistentVolumeClaim:
          claimName: data-pvc
EOF

# Wait for pod to write test data
kubectl wait pod -l app=stateful-app -n test-restore \
  --for=condition=Ready --timeout=120s

# Verify test data written to PVC
kubectl exec -n test-restore \
  $(kubectl get pod -l app=stateful-app -n test-restore -o name) \
  -- cat /data/test.txt
03

Create a Velero Schedule for automated backups

The Schedule object creates recurring backups. For this exercise, trigger a manual backup immediately after creating the schedule.

# Create a backup schedule (every 6 hours, 7 day retention)
cat << 'EOF' | kubectl apply -f -
apiVersion: velero.io/v1
kind: Schedule
metadata:
  name: cluster-backup
  namespace: velero
spec:
  schedule: "0 */6 * * *"
  template:
    ttl: 168h    # 7 days retention
    includedNamespaces:
    - "*"
    excludedNamespaces:
    - velero
    - kube-system
    storageLocation: default
    volumeSnapshotLocations:
    - default
    snapshotVolumes: true
    labelSelector: {}
EOF

# Trigger an immediate backup (don't wait for schedule)
velero backup create cluster-backup-manual \
  --include-namespaces="*" \
  --exclude-namespaces velero,kube-system \
  --snapshot-volumes=true \
  --wait

# Check backup status
velero backup describe cluster-backup-manual
velero backup logs cluster-backup-manual | tail -20
04

Verify backup in S3 and check snapshot

# List backup objects in S3
aws s3 ls s3://velero-backups-crossregion/backups/ --recursive

# Verify EBS snapshot was created
aws ec2 describe-snapshots \
  --filters Name=tag-key,Values=velero.io/backup \
  --query 'Snapshots[*].{ID:SnapshotId,Size:VolumeSize,State:State}' \
  --output table

# Get backup summary from Velero
velero backup get
# STATUS should be: Completed
# ERRORS/WARNINGS should be: 0
05

Install Velero on the target cluster (region B)

Switch kubectl context to the target cluster and install Velero pointing to the same S3 bucket. Velero will discover existing backups automatically.

# Switch to target cluster context
kubectl config use-context arn:aws:eks:us-west-2:ACCOUNT:cluster/target-cluster

# Install Velero on target with same bucket (different region config)
velero install \
  --provider aws \
  --plugins velero/velero-plugin-for-aws:v1.8.0 \
  --bucket velero-backups-crossregion \
  --backup-location-config region=us-east-1 \
  --snapshot-location-config region=us-east-1 \
  --secret-file ./velero-credentials \
  --use-volume-snapshots=true

# Wait for Velero to sync backup inventory
kubectl wait deploy/velero -n velero \
  --for=condition=Available --timeout=120s

# Verify backup is visible from target cluster
velero backup get
# cluster-backup-manual should appear with status Completed
06

Restore to the target cluster

Create a Restore object. Velero will recreate all Kubernetes objects and create new PVCs from the EBS snapshots in the target region.

# Restore from the backup (creates new resources in target)
velero restore create cluster-restore \
  --from-backup cluster-backup-manual \
  --include-namespaces test-restore \
  --wait

# Check restore status
velero restore describe cluster-restore
velero restore logs cluster-restore | grep -i "error\|warn"

# Monitor restore progress
velero restore get
# STATUS: Completed (may take 5-15 min for PVC snapshots)
07

Validate all restored resources

Run through this validation checklist to confirm the restore is complete and correct.

# 1. Verify namespace was restored
kubectl get namespace test-restore

# 2. Verify secret was restored (check keys exist)
kubectl get secret db-credentials -n test-restore \
  -o jsonpath='{.data}' | python3 -c "
import sys, json, base64
d = json.load(sys.stdin)
for k, v in d.items():
    print(f'{k}: {base64.b64decode(v).decode()}')"

# 3. Verify PVC was restored and bound
kubectl get pvc -n test-restore
# STATUS should be: Bound

# 4. Verify pod is running
kubectl get pods -n test-restore

# 5. Verify PVC data is intact
kubectl exec -n test-restore \
  $(kubectl get pod -l app=stateful-app -n test-restore -o name) \
  -- cat /data/test.txt
# Expected: restore-test-data

# 6. Verify ServiceAccount was restored
kubectl get serviceaccounts -n test-restore
08

Clean up resources

# Delete backup from Velero (also deletes S3 objects + snapshots)
velero backup delete cluster-backup-manual --confirm

# Remove test namespace from target cluster
kubectl delete namespace test-restore

# Remove Velero from both clusters
kubectl delete namespace velero

# Delete S3 bucket (empty it first)
aws s3 rm s3://velero-backups-crossregion --recursive
aws s3 rb s3://velero-backups-crossregion

Success Criteria

Key Concepts

Further Reading