Provision a Multi-Zone AKS Cluster with Terraform

Objective

Use Terraform to provision an Azure Kubernetes Service (AKS) cluster with nodes distributed across three availability zones. Understand how Azure zone infrastructure maps to Kubernetes node topology labels and how zone-aware scheduling works by default. By the end you will have a working cluster, understand its resource model, and be able to verify zone distribution through kubectl output.

Prerequisites

Azure subscription with Contributor access and at least 16 vCPU quota in your chosen region
Terraform >= 1.5 installed (terraform version to verify)
Azure CLI installed and authenticated (az login completed)
kubectl installed and configured
A resource group pre-created, or rights to create one in your subscription

Steps

Create the project structure

Set up your Terraform project with separate files for provider config, main resources, variables, and outputs. This separation makes the module reusable and testable.

mkdir aks-multizone && cd aks-multizone
touch main.tf variables.tf outputs.tf providers.tf

Configure the providers.tf

Pin the AzureRM provider to a specific version to ensure reproducible runs. Enable the features block required by the AzureRM provider.

# providers.tf
terraform {
  required_version = ">= 1.5"
  required_providers {
    azurerm = {
      source  = "hashicorp/azurerm"
      version = "~> 3.85"
    }
  }
  # Recommended: store state in Azure Blob Storage
  backend "azurerm" {
    resource_group_name  = "tf-state-rg"
    storage_account_name = "tfstateaccount"
    container_name       = "tfstate"
    key                  = "aks-multizone.tfstate"
  }
}

provider "azurerm" {
  features {}
}

Define variables.tf

Parameterise the configuration so the same module can provision dev, staging, and production clusters by changing variable files rather than code.

# variables.tf
variable "resource_group_name" {
  type    = string
  default = "aks-multizone-rg"
}

variable "location" {
  type    = string
  default = "eastus2"
  # Zones available: East US 2, West US 2, West Europe, etc.
}

variable "cluster_name" {
  type    = string
  default = "aks-multizone-cluster"
}

variable "kubernetes_version" {
  type    = string
  default = "1.29"
}

variable "system_node_count" {
  type    = number
  default = 3  # One per zone
}

variable "system_node_vm_size" {
  type    = string
  default = "Standard_D4s_v5"
}

variable "user_node_count" {
  type    = number
  default = 3
}

variable "user_node_vm_size" {
  type    = string
  default = "Standard_D8s_v5"
}

variable "tags" {
  type = map(string)
  default = {
    environment = "training"
    managed_by  = "terraform"
  }
}

Write main.tf with zone-aware node pools

The critical configuration is the zones argument on each node pool. AKS will distribute nodes evenly across zones 1, 2, and 3 within the region. The system pool runs cluster-critical workloads (CoreDNS, metrics-server); the user pool runs application workloads.

# main.tf
resource "azurerm_resource_group" "main" {
  name     = var.resource_group_name
  location = var.location
  tags     = var.tags
}

resource "azurerm_kubernetes_cluster" "main" {
  name                = var.cluster_name
  location            = azurerm_resource_group.main.location
  resource_group_name = azurerm_resource_group.main.name
  dns_prefix          = var.cluster_name
  kubernetes_version  = var.kubernetes_version

  # System node pool — must be named "system"
  default_node_pool {
    name                = "system"
    node_count          = var.system_node_count
    vm_size             = var.system_node_vm_size
    zones               = ["1", "2", "3"]
    os_disk_size_gb     = 128
    os_disk_type        = "Managed"
    type                = "VirtualMachineScaleSets"

    node_labels = {
      "role" = "system"
    }

    # Taint to prevent user workloads on system nodes
    only_critical_addons_enabled = true

    upgrade_settings {
      max_surge = "33%"
    }
  }

  identity {
    type = "SystemAssigned"
  }

  network_profile {
    network_plugin = "azure"  # Azure CNI
    network_policy = "calico"
    load_balancer_sku = "standard"
  }

  # Azure Monitor / OMS integration
  oms_agent {
    log_analytics_workspace_id = azurerm_log_analytics_workspace.main.id
  }

  azure_active_directory_role_based_access_control {
    managed = true
    azure_rbac_enabled = true
  }

  tags = var.tags
}

# User node pool in separate zones
resource "azurerm_kubernetes_cluster_node_pool" "user" {
  name                  = "user"
  kubernetes_cluster_id = azurerm_kubernetes_cluster.main.id
  vm_size               = var.user_node_vm_size
  node_count            = var.user_node_count
  zones                 = ["1", "2", "3"]
  os_disk_size_gb       = 256
  os_disk_type          = "Managed"
  mode                  = "User"

  node_labels = {
    "role" = "user"
  }

  upgrade_settings {
    max_surge = "33%"
  }

  tags = var.tags
}

resource "azurerm_log_analytics_workspace" "main" {
  name                = "${var.cluster_name}-law"
  location            = azurerm_resource_group.main.location
  resource_group_name = azurerm_resource_group.main.name
  sku                 = "PerGB2018"
  retention_in_days   = 30
  tags                = var.tags
}

Define outputs.tf

# outputs.tf
output "cluster_name" {
  value = azurerm_kubernetes_cluster.main.name
}

output "kube_config_raw" {
  value     = azurerm_kubernetes_cluster.main.kube_config_raw
  sensitive = true
}

output "node_resource_group" {
  value = azurerm_kubernetes_cluster.main.node_resource_group
  # This is where AKS places VMs, disks, load balancers
}

output "oidc_issuer_url" {
  value = azurerm_kubernetes_cluster.main.oidc_issuer_url
}

Initialise, plan, and apply

Run the standard Terraform workflow. Review the plan carefully — ensure zones appears correctly on both node pools before applying. The apply takes approximately 8-12 minutes.

# Initialise providers and backend
terraform init

# Preview changes — review zones config in plan output
terraform plan -out=tfplan

# Apply (takes ~10 min)
terraform apply tfplan

# Retrieve kubeconfig
az aks get-credentials \
  --resource-group aks-multizone-rg \
  --name aks-multizone-cluster \
  --overwrite-existing

Verify zone distribution with kubectl

Check that nodes are distributed across zones using the topology.kubernetes.io/zone label. Each zone should have at least one node from both node pools.

# View nodes with zone and region labels
kubectl get nodes -o wide \
  --label-columns=topology.kubernetes.io/zone,kubernetes.azure.com/agentpool

# Expected output (6 nodes: 3 system + 3 user):
# NAME                STATUS   ZONE         AGENTPOOL
# aks-system-...0     Ready    eastus2-1    system
# aks-system-...1     Ready    eastus2-2    system
# aks-system-...2     Ready    eastus2-3    system
# aks-user-...0       Ready    eastus2-1    user
# aks-user-...1       Ready    eastus2-2    user
# aks-user-...2       Ready    eastus2-3    user

# Verify node labels in detail
kubectl get nodes -o json | jq '
  .items[] | {
    name: .metadata.name,
    zone: .metadata.labels["topology.kubernetes.io/zone"],
    pool: .metadata.labels["kubernetes.azure.com/agentpool"]
  }
'

Review the Azure resource model

AKS creates a "node resource group" (MC_ prefixed) that contains the actual VM scale sets, NICs, disks, and load balancers. Understanding this separation is important for cost allocation and IAM.

# List resources in the node resource group
NODE_RG=$(az aks show \
  --resource-group aks-multizone-rg \
  --name aks-multizone-cluster \
  --query nodeResourceGroup -o tsv)

az resource list --resource-group $NODE_RG \
  --output table --query "[].{Name:name, Type:type}"

# Inspect VMSS zone distribution
az vmss list --resource-group $NODE_RG \
  --query "[].{Name:name, Zones:zones}" \
  --output table

The MC_ resource group is managed by AKS. Do not modify resources in it directly — Terraform or the AKS API should be the only management plane. Direct edits will be reverted on next reconciliation.

Clean up

Destroy the cluster when done to avoid unnecessary Azure charges. AKS node VMs are billed per minute.

terraform destroy -auto-approve

The destroy operation also deletes the Log Analytics Workspace and all stored logs. If you need to retain logs, export them first or set retention_in_days and remove the workspace from Terraform state before destroying.

Success Criteria

Terraform apply completes without errors kubectl get nodes shows 6 nodes (3 system + 3 user) in Ready state Each of the three availability zones has at least one node topology.kubernetes.io/zone label is present on all nodes System nodes have the CriticalAddonsOnly taint visible via kubectl describe node Azure VMSS instances visible in node resource group with zone distribution

Key Concepts

Understanding the topology labels is critical for scheduling zone-aware workloads:

topology.kubernetes.io/zone — set by AKS cloud-provider; value is region+zone e.g. eastus2-1
topology.kubernetes.io/region — set to the Azure region e.g. eastus2
VirtualMachineScaleSets mode — required for zone support; "AvailabilitySet" mode does not support zones
Standard Load Balancer SKU — required for cross-zone traffic; Basic SKU is zone-local only
Surge upgrade — max_surge: 33% adds one extra node per zone during upgrades, maintaining zone balance