Tag Archive: AntiAffinity

Kubernetes: Distributing Pods of a Deployment across nodes

May 17, 2022 9:14 am Published by

Sometimes you need to ensure that the pods of a deployment are not deployed to the same node. To achieve this, you can use the pod anti-affinity and configure it so that pods do not get deployed to pods of the same deployment:

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: game
  name: game
  namespace: arcade
spec:
  progressDeadlineSeconds: 600
  replicas: 2
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: game
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: game
    spec:
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: app
                operator: In
                values:
                - game
            topologyKey: kubernetes.io/hostname
      containers:
      - image: quay.io/mdewald/s3e
        name: s3e

This pod anti-affinity definition will not deploy any 2 pods of the deployment onto the same node.

During the roll-out, additional pods are created before old pods are removed. If you have the same number of nodes as replicas, that means the roll-out won’t happen: No node is available to suffice the criteria to deploy an additional pod. So in the best case, you should have more nodes available than the deployment requires replicas.

You can work around this problem by changing from requiredDuringSchedulingIgnoredDuringExecution to preferredDuringScheduilingIgnoredDuringExecution:

podAntiAffinity:
  preferredDuringSchedulingIgnoredDuringExecution:
  - podAffinityTerm:
      labelSelector:
        matchExpressions:
        - key: app
          operator: In
          values:
          - game
        topologyKey: kubernetes.io/hostname
    weight: 100

However, this would allow some of the pods of the deployment to land on the same node during a roll-out of the deployment. After the roll-out, they will be distributed one pod per node again.

If you absolutely don’t want to ever have 2 pods of the same deployment run on the same node but don’t have more nodes than replicas, it can be an option for you to migrate from a Deployment to StatefulSet, which will first terminate each pod before creating a new one:

apiVersion: apps/v1
kind: StatefulSet
metadata:
  labels:
    app: game
  name: game
  namespace: arcade
spec:
  replicas: 2
  selector:
    matchLabels:
      app: game
  serviceName: ""
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: game
    spec:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: app
                operator: In
                values:
                - game
            topologyKey: kubernetes.io/hostname
      containers:
      - image: quay.io/mdewald/s3e
        name: s3e

This will ensure no pod of the StatefulSet is scheduled to the same node. If you have the same number of nodes as replicas in the StatefulSet the rollout will do the following: One by one, the pods will be removed and the replacement will be scheduled to the same node before the next pod is removed.