Node Management - Drain, Cordon and Uncordon

RNREDDY
Sep 9
3 min read

Kubernetes Node Management - Drain, Cordon and Uncordon

Most Kubernetes engineers don’t start their day expecting to drain a node, but they often end up doing just that. Managing node availability becomes routine work that directly affects workload stability and uptime. You’ll often use drain, cordon, and uncordon when:

Scaling a cluster

Preparing for a node upgrade

Patching OS level vulnerabilities

Replacing underlying infrastructure

Investigating issues on a specific node

Let’s go through how these actually behave, with visual examples.

1. DRAIN

The kubectl drain command is used when you want to safely evict all running pods from a node and prevent new ones from being scheduled on it. This is typically used during node maintenance, upgrades, or when preparing to decommission a node.

When you run kubectl drain node2, Kubernetes performs two actions:

It marks the node as unschedulable (SchedulingDisabled).

It evicts all non daemonset pods from the node.

Before Drain: All three nodes are healthy and ready to accept pods. node2 is running Pod C and Pod D.

Once kubectl drain node2 is executed:

Pod C is moved to node1.

Pod D is moved to node3.

node2 is marked as SchedulingDisabled so no new pods are placed there.

Things I learnt after burn out:

Use --ignore-daemonsets or the command fails if daemonset pods are present.

Pods using emptyDir lose all data when evicted, even if they come back quickly.

If a PodDisruptionBudget is set, drain can block until it’s safe to evict.

Hanging drains are usually due to finalizers or stuck shutdown hooks. Use --force only if you understand the risk.

Drain marks the node as unschedulable. You must run uncordon manually to bring it back.

Draining nodes with system pods and no tolerations can silently break networking or DNS.

2. Cordon

The kubectl cordon command is used when you want to stop new pods from being scheduled on a node, but keep existing pods running. This is often done before maintenance, scaling operations, or selective upgrades where you don’t want to disrupt workloads immediately.

Once kubectl cordon node2 is executed:

Pod C and Pod D continue running on node2.

No new pods will be scheduled on it.

node2 status now shows SchedulingDisabled.

The output of kubectl get nodes after cordon and after drain looks the same right ? the difference is not in kubectl get nodes output, but in behavior:

kubectl drain → Pods evicted

kubectl cordon → Pods stay as-is

Things I learnt:

cordon only stops new pods from scheduling, it does not touch running ones.

Useful before maintenance when you don’t want pods landing on a node.

You won’t see any pod movement, it’s a passive block, not active eviction.

Nodes stay in Ready,SchedulingDisabled until you explicitly uncordon them.

If autoscalers are active, cordoned nodes may still be targeted unless excluded.

3. UnCordon

The kubectl uncordon command is used to allow new pods to be scheduled on a node that was previously marked as unschedulable. It’s typically used after a cordon or drain operation, once maintenance or upgrades are complete.

When you run kubectl uncordon node2, it marks the node as schedulable again (Ready), allowing the scheduler to place new pods on it. This has no effect on currently running pods or pending ones, it simply reopens the node for future pod placements.

Before uncordon: node2 is marked as SchedulingDisabled, so no new pods can land on it.

Once kubectl uncordon node2 is run:

The node is restored to a fully schedulable state.

New pods can now be scheduled on node2 again.

This is the final step to return a node back to normal operation after any kind of intentional scheduling pause. Without it, your workloads may remain unintentionally unbalanced.

Things I learnt after forgetting kubectl uncordon:

uncordon re enables scheduling on a node marked as SchedulingDisabled.

It doesn’t move any pods back, it just allows new pods to be scheduled.

Works instantly, no restart or reload needed..

Cluster might look healthy, but workloads won’t rebalance until needed.

Always verify with kubectl get nodes to ensure the node is back to Ready.

DevOps On Fly

Node Management - Drain, Cordon and Uncordon

Recent Posts

Comments