← Back to Guide
Cluster Architecture L1 · INTRO ~45 min

Provision a Multi-Zone AKS Cluster with Terraform

Provision a production-grade AKS cluster across three availability zones using Terraform. Configure separate system and user node pools, enable zone-aware scheduling, and verify zone distribution using kubectl.

Objective

Use Terraform to provision an Azure Kubernetes Service (AKS) cluster with nodes distributed across three availability zones. Understand how Azure zone infrastructure maps to Kubernetes node topology labels and how zone-aware scheduling works by default. By the end you will have a working cluster, understand its resource model, and be able to verify zone distribution through kubectl output.

Prerequisites

Steps

01

Create the project structure

Set up your Terraform project with separate files for provider config, main resources, variables, and outputs. This separation makes the module reusable and testable.

mkdir aks-multizone && cd aks-multizone
touch main.tf variables.tf outputs.tf providers.tf
02

Configure the providers.tf

Pin the AzureRM provider to a specific version to ensure reproducible runs. Enable the features block required by the AzureRM provider.

# providers.tf
terraform {
  required_version = ">= 1.5"
  required_providers {
    azurerm = {
      source  = "hashicorp/azurerm"
      version = "~> 3.85"
    }
  }
  # Recommended: store state in Azure Blob Storage
  backend "azurerm" {
    resource_group_name  = "tf-state-rg"
    storage_account_name = "tfstateaccount"
    container_name       = "tfstate"
    key                  = "aks-multizone.tfstate"
  }
}

provider "azurerm" {
  features {}
}
03

Define variables.tf

Parameterise the configuration so the same module can provision dev, staging, and production clusters by changing variable files rather than code.

# variables.tf
variable "resource_group_name" {
  type    = string
  default = "aks-multizone-rg"
}

variable "location" {
  type    = string
  default = "eastus2"
  # Zones available: East US 2, West US 2, West Europe, etc.
}

variable "cluster_name" {
  type    = string
  default = "aks-multizone-cluster"
}

variable "kubernetes_version" {
  type    = string
  default = "1.29"
}

variable "system_node_count" {
  type    = number
  default = 3  # One per zone
}

variable "system_node_vm_size" {
  type    = string
  default = "Standard_D4s_v5"
}

variable "user_node_count" {
  type    = number
  default = 3
}

variable "user_node_vm_size" {
  type    = string
  default = "Standard_D8s_v5"
}

variable "tags" {
  type = map(string)
  default = {
    environment = "training"
    managed_by  = "terraform"
  }
}
04

Write main.tf with zone-aware node pools

The critical configuration is the zones argument on each node pool. AKS will distribute nodes evenly across zones 1, 2, and 3 within the region. The system pool runs cluster-critical workloads (CoreDNS, metrics-server); the user pool runs application workloads.

# main.tf
resource "azurerm_resource_group" "main" {
  name     = var.resource_group_name
  location = var.location
  tags     = var.tags
}

resource "azurerm_kubernetes_cluster" "main" {
  name                = var.cluster_name
  location            = azurerm_resource_group.main.location
  resource_group_name = azurerm_resource_group.main.name
  dns_prefix          = var.cluster_name
  kubernetes_version  = var.kubernetes_version

  # System node pool — must be named "system"
  default_node_pool {
    name                = "system"
    node_count          = var.system_node_count
    vm_size             = var.system_node_vm_size
    zones               = ["1", "2", "3"]
    os_disk_size_gb     = 128
    os_disk_type        = "Managed"
    type                = "VirtualMachineScaleSets"

    node_labels = {
      "role" = "system"
    }

    # Taint to prevent user workloads on system nodes
    only_critical_addons_enabled = true

    upgrade_settings {
      max_surge = "33%"
    }
  }

  identity {
    type = "SystemAssigned"
  }

  network_profile {
    network_plugin = "azure"  # Azure CNI
    network_policy = "calico"
    load_balancer_sku = "standard"
  }

  # Azure Monitor / OMS integration
  oms_agent {
    log_analytics_workspace_id = azurerm_log_analytics_workspace.main.id
  }

  azure_active_directory_role_based_access_control {
    managed = true
    azure_rbac_enabled = true
  }

  tags = var.tags
}

# User node pool in separate zones
resource "azurerm_kubernetes_cluster_node_pool" "user" {
  name                  = "user"
  kubernetes_cluster_id = azurerm_kubernetes_cluster.main.id
  vm_size               = var.user_node_vm_size
  node_count            = var.user_node_count
  zones                 = ["1", "2", "3"]
  os_disk_size_gb       = 256
  os_disk_type          = "Managed"
  mode                  = "User"

  node_labels = {
    "role" = "user"
  }

  upgrade_settings {
    max_surge = "33%"
  }

  tags = var.tags
}

resource "azurerm_log_analytics_workspace" "main" {
  name                = "${var.cluster_name}-law"
  location            = azurerm_resource_group.main.location
  resource_group_name = azurerm_resource_group.main.name
  sku                 = "PerGB2018"
  retention_in_days   = 30
  tags                = var.tags
}
05

Define outputs.tf

# outputs.tf
output "cluster_name" {
  value = azurerm_kubernetes_cluster.main.name
}

output "kube_config_raw" {
  value     = azurerm_kubernetes_cluster.main.kube_config_raw
  sensitive = true
}

output "node_resource_group" {
  value = azurerm_kubernetes_cluster.main.node_resource_group
  # This is where AKS places VMs, disks, load balancers
}

output "oidc_issuer_url" {
  value = azurerm_kubernetes_cluster.main.oidc_issuer_url
}
06

Initialise, plan, and apply

Run the standard Terraform workflow. Review the plan carefully — ensure zones appears correctly on both node pools before applying. The apply takes approximately 8-12 minutes.

# Initialise providers and backend
terraform init

# Preview changes — review zones config in plan output
terraform plan -out=tfplan

# Apply (takes ~10 min)
terraform apply tfplan

# Retrieve kubeconfig
az aks get-credentials \
  --resource-group aks-multizone-rg \
  --name aks-multizone-cluster \
  --overwrite-existing
07

Verify zone distribution with kubectl

Check that nodes are distributed across zones using the topology.kubernetes.io/zone label. Each zone should have at least one node from both node pools.

# View nodes with zone and region labels
kubectl get nodes -o wide \
  --label-columns=topology.kubernetes.io/zone,kubernetes.azure.com/agentpool

# Expected output (6 nodes: 3 system + 3 user):
# NAME                STATUS   ZONE         AGENTPOOL
# aks-system-...0     Ready    eastus2-1    system
# aks-system-...1     Ready    eastus2-2    system
# aks-system-...2     Ready    eastus2-3    system
# aks-user-...0       Ready    eastus2-1    user
# aks-user-...1       Ready    eastus2-2    user
# aks-user-...2       Ready    eastus2-3    user

# Verify node labels in detail
kubectl get nodes -o json | jq '
  .items[] | {
    name: .metadata.name,
    zone: .metadata.labels["topology.kubernetes.io/zone"],
    pool: .metadata.labels["kubernetes.azure.com/agentpool"]
  }
'
08

Review the Azure resource model

AKS creates a "node resource group" (MC_ prefixed) that contains the actual VM scale sets, NICs, disks, and load balancers. Understanding this separation is important for cost allocation and IAM.

# List resources in the node resource group
NODE_RG=$(az aks show \
  --resource-group aks-multizone-rg \
  --name aks-multizone-cluster \
  --query nodeResourceGroup -o tsv)

az resource list --resource-group $NODE_RG \
  --output table --query "[].{Name:name, Type:type}"

# Inspect VMSS zone distribution
az vmss list --resource-group $NODE_RG \
  --query "[].{Name:name, Zones:zones}" \
  --output table
The MC_ resource group is managed by AKS. Do not modify resources in it directly — Terraform or the AKS API should be the only management plane. Direct edits will be reverted on next reconciliation.
09

Clean up

Destroy the cluster when done to avoid unnecessary Azure charges. AKS node VMs are billed per minute.

terraform destroy -auto-approve
The destroy operation also deletes the Log Analytics Workspace and all stored logs. If you need to retain logs, export them first or set retention_in_days and remove the workspace from Terraform state before destroying.

Success Criteria

Key Concepts

Understanding the topology labels is critical for scheduling zone-aware workloads:

Further Reading