Objective
Install Velero with cloud provider snapshot support. Configure scheduled backups to cross-region object storage. Populate the cluster with representative workloads including a stateful application with PVC data. Simulate a full cluster loss by deleting the source cluster. Restore everything to a new target cluster in a different region and validate all workloads, secrets, ConfigMaps, and PVC data are intact.
Prerequisites
- Source cluster in region A (e.g., us-east-1) with cluster-admin access
- Target cluster pre-provisioned in region B (e.g., us-west-2) — same K8s version
- S3 bucket or Azure Blob container in a region accessible from both clusters
- Velero CLI installed (velero.io/docs/latest/basic-install)
- Cloud provider IAM permissions for snapshot creation and S3/Blob writes
Steps
Install Velero on the source cluster (AWS example)
Velero needs an S3 bucket for backup storage and IAM credentials for EBS snapshot access. Use IRSA (IAM Roles for Service Accounts) in production; use access keys for this exercise.
# Create S3 bucket in a region accessible from both clusters aws s3 mb s3://velero-backups-crossregion \ --region us-east-1 # Enable versioning on the bucket aws s3api put-bucket-versioning \ --bucket velero-backups-crossregion \ --versioning-configuration Status=Enabled # Create velero-credentials file cat > velero-credentials << EOF [default] aws_access_key_id=YOUR_ACCESS_KEY aws_secret_access_key=YOUR_SECRET_KEY EOF # Install Velero with AWS provider velero install \ --provider aws \ --plugins velero/velero-plugin-for-aws:v1.8.0 \ --bucket velero-backups-crossregion \ --backup-location-config region=us-east-1 \ --snapshot-location-config region=us-east-1 \ --secret-file ./velero-credentials \ --use-volume-snapshots=true # Verify Velero is running kubectl get pods -n velero velero backup-location get
Deploy test workloads including a stateful app
Create workloads that test different restore scenarios: a stateless deployment, a secret, a ConfigMap, a ServiceAccount, and a stateful app with PVC data.
# Create test namespace and workloads kubectl create namespace test-restore # Create a secret to verify secret restoration kubectl create secret generic db-credentials \ --from-literal=username=admin \ --from-literal=password=supersecret123 \ -n test-restore # Deploy stateful app with PVC cat << 'EOF' | kubectl apply -f - apiVersion: v1 kind: PersistentVolumeClaim metadata: name: data-pvc namespace: test-restore spec: accessModes: [ReadWriteOnce] resources: requests: storage: 5Gi --- apiVersion: apps/v1 kind: Deployment metadata: name: stateful-app namespace: test-restore spec: replicas: 1 selector: matchLabels: {app: stateful-app} template: metadata: labels: {app: stateful-app} spec: containers: - name: app image: busybox command: [sh, -c, "echo 'restore-test-data' > /data/test.txt && sleep 3600"] volumeMounts: - mountPath: /data name: data volumes: - name: data persistentVolumeClaim: claimName: data-pvc EOF # Wait for pod to write test data kubectl wait pod -l app=stateful-app -n test-restore \ --for=condition=Ready --timeout=120s # Verify test data written to PVC kubectl exec -n test-restore \ $(kubectl get pod -l app=stateful-app -n test-restore -o name) \ -- cat /data/test.txt
Create a Velero Schedule for automated backups
The Schedule object creates recurring backups. For this exercise, trigger a manual backup immediately after creating the schedule.
# Create a backup schedule (every 6 hours, 7 day retention) cat << 'EOF' | kubectl apply -f - apiVersion: velero.io/v1 kind: Schedule metadata: name: cluster-backup namespace: velero spec: schedule: "0 */6 * * *" template: ttl: 168h # 7 days retention includedNamespaces: - "*" excludedNamespaces: - velero - kube-system storageLocation: default volumeSnapshotLocations: - default snapshotVolumes: true labelSelector: {} EOF # Trigger an immediate backup (don't wait for schedule) velero backup create cluster-backup-manual \ --include-namespaces="*" \ --exclude-namespaces velero,kube-system \ --snapshot-volumes=true \ --wait # Check backup status velero backup describe cluster-backup-manual velero backup logs cluster-backup-manual | tail -20
Verify backup in S3 and check snapshot
# List backup objects in S3 aws s3 ls s3://velero-backups-crossregion/backups/ --recursive # Verify EBS snapshot was created aws ec2 describe-snapshots \ --filters Name=tag-key,Values=velero.io/backup \ --query 'Snapshots[*].{ID:SnapshotId,Size:VolumeSize,State:State}' \ --output table # Get backup summary from Velero velero backup get # STATUS should be: Completed # ERRORS/WARNINGS should be: 0
Install Velero on the target cluster (region B)
Switch kubectl context to the target cluster and install Velero pointing to the same S3 bucket. Velero will discover existing backups automatically.
# Switch to target cluster context kubectl config use-context arn:aws:eks:us-west-2:ACCOUNT:cluster/target-cluster # Install Velero on target with same bucket (different region config) velero install \ --provider aws \ --plugins velero/velero-plugin-for-aws:v1.8.0 \ --bucket velero-backups-crossregion \ --backup-location-config region=us-east-1 \ --snapshot-location-config region=us-east-1 \ --secret-file ./velero-credentials \ --use-volume-snapshots=true # Wait for Velero to sync backup inventory kubectl wait deploy/velero -n velero \ --for=condition=Available --timeout=120s # Verify backup is visible from target cluster velero backup get # cluster-backup-manual should appear with status Completed
Restore to the target cluster
Create a Restore object. Velero will recreate all Kubernetes objects and create new PVCs from the EBS snapshots in the target region.
# Restore from the backup (creates new resources in target) velero restore create cluster-restore \ --from-backup cluster-backup-manual \ --include-namespaces test-restore \ --wait # Check restore status velero restore describe cluster-restore velero restore logs cluster-restore | grep -i "error\|warn" # Monitor restore progress velero restore get # STATUS: Completed (may take 5-15 min for PVC snapshots)
Validate all restored resources
Run through this validation checklist to confirm the restore is complete and correct.
# 1. Verify namespace was restored kubectl get namespace test-restore # 2. Verify secret was restored (check keys exist) kubectl get secret db-credentials -n test-restore \ -o jsonpath='{.data}' | python3 -c " import sys, json, base64 d = json.load(sys.stdin) for k, v in d.items(): print(f'{k}: {base64.b64decode(v).decode()}')" # 3. Verify PVC was restored and bound kubectl get pvc -n test-restore # STATUS should be: Bound # 4. Verify pod is running kubectl get pods -n test-restore # 5. Verify PVC data is intact kubectl exec -n test-restore \ $(kubectl get pod -l app=stateful-app -n test-restore -o name) \ -- cat /data/test.txt # Expected: restore-test-data # 6. Verify ServiceAccount was restored kubectl get serviceaccounts -n test-restore
Clean up resources
# Delete backup from Velero (also deletes S3 objects + snapshots) velero backup delete cluster-backup-manual --confirm # Remove test namespace from target cluster kubectl delete namespace test-restore # Remove Velero from both clusters kubectl delete namespace velero # Delete S3 bucket (empty it first) aws s3 rm s3://velero-backups-crossregion --recursive aws s3 rb s3://velero-backups-crossregion
Success Criteria
Key Concepts
- BackupStorageLocation — where Velero stores backup metadata and object manifests (S3/Azure Blob/GCS)
- VolumeSnapshotLocation — where volume snapshots are created (cloud-provider specific, must be same region as volumes)
- Cross-region restore limitation — EBS snapshots must be copied to the target region before restore; Velero does not do this automatically. Use the restic/kopia file-system backup mode for true cross-region PVC restore
- Velero file-system backup — use
--default-volumes-to-fs-backupfor cloud-agnostic PVC backup via restic/kopia; slower but region-independent
Further Reading
- Velero documentation — velero.io/docs/latest
- Velero AWS plugin — github.com/vmware-tanzu/velero-plugin-for-aws
- File system backup with Kopia — velero.io/docs/latest/file-system-backup
- Disaster recovery planning — aws.amazon.com/blogs/containers/backup-and-restore-your-amazon-eks-cluster-resources-using-velero