Place Nodes in Maintenance Mode

Table of contents

  1. The Problem
  2. The Solution
  3. Usage

The Problem

Kubernetes admins face situations where they might do something harmful to the cluster (e.g., replacing a drive, RAM, or a NIC), thus prior to that they should place the cluster nodes into maintenance mode. When a node is entering the maintenance mode, then it’s workloads are migrated to another (available) node. In addition, an admin can already use kubectl drain/cordon since Kubernetes v1.5 to perform this kind of operation. But because it could be a long running process that is sensitive to network loss, between admin and cluster nodes, thus there is a need for an automatic way which is also independent.

The Solution

Node Maintenance Operator (NMO) is an open source Kubernetes operator which keeps nodes cordoned and drained while a matching NodeMaintenance (nm) custom resource (CR) exists.

Generally Available

nmo-icon

The operator perform a declarative way for doing kubectl cordon NODE (set node as unschedulable), kubectl drain NODE (evict pods from node), and kubectl taint NODE NAME KEY_1=VAL_1:TAINT_EFFECT_1 (add taint for node). The operator watches for new or deleted CRs which indicate that a node in the cluster should be placed into maintenance when nm CR has been created, or should end maintenance when nm CR has been deleted. When the node is in maintenance mode, then it is cordoned - set as unschedulable, and all the (possible) pods are drained (evicted) from that node. When the node leaves maintenance mode, then it is uncordoned - set as schedulable. Detailed progress and node status details are provided during maintenance.

Usage

The operator can be installed from OperatorHub or built and run from source. See NMO’s Readme for how to set/unset the node into maintenance mode, and also for better understanding of the nm CR status.